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Abstract — We consider distillation of secret bits from partially 
secret noisy correlations Pabe, shared between two honest par- 
ties and an eavesdropper. The most studied distillation scenario 
consists of joint operations on a large number of copies of the 
distribution (Pabe) n , assisted with public communication. Here 
we consider distillation with only one copy of the distribution, 
and instead of rates, the 'quality' of the distilled secret bits is 
optimized, where the 'quality' is quantified by the secret-bit frac- 
tion of the result. The secret bit fraction of a binary distribution 
is the proportion which constitutes a secret bit between Alice 
and Bob. With local operations and public communication the 
maximal extractable secret-bit fraction from a distribution Pabe 
is found, and is denoted by A[Pabe]- This quantity is shown to be 
nonincreasing under local operations and public communication, 
and nondecreasing under eavesdropper's local operations: A is a 
secrecy monotone. It is shown that if A[Pabe] > 1/2 then Pabe 
is distillable, thus providing a sufficient condition for distillability. 
A simple expression for K[Pabe] is found when the eavesdropper 
is decoupled, and when the honest parties' information is binary 
and the local operations are reversible. Intriguingly, for general 
distributions the (optimal) operation requires local degradation 
of the data. 

Index Terms — Cryptography, privacy amplification, quantum 
information theory, secret-key agreement. 



I. Introduction 

If two parties are to communicate with perfect secrecy over 
an insecure channel, they must share a secret key at least as 
long as the message to be transmitted [21], [1]. It is, however, 
not always necessary for the two parties (Alice (A) and Bob 
(B)) to meet up in order to obtain a shared secret key [22], 
[6], [15]. It might be the case that, secret key aside, the three 
parties (Alice, Bob and Eve (E) the eavesdropper) have access 
to an information source which provides partially correlated 
data to each of them. These correlations can be captured by a 
tripartite probability distribution Pabe- If Eve has access to 
the same information as Alice and Bob, secure key generation 
is impossible. However, there are many possible physical 
scenarios in which this perfect correlation is not present; in 
these cases this difference in knowledge can sometimes be 
exploited to generate secret key. 

Inspired by closely related work by Wyner, and Csiszar 
and Korner [22], [6], Maurer [15] presented a protocol for 
secret key agreement by public discussion which exploits 
such imperfect knowledge. In his approach Alice and Bob 
are given access to an insecure, authenticated, tamper-proof 
channel and also receive sample data from a distribution 
Pabe- In an example, he considers the distribution generated 
when a satellite broadcasts the same random bits to each 
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party but Alice, Bob and Eve receive the information down 
binary symmetric channels with bit errors of 20%, 20% and 
15% respectively. Even though Eve's error is less than Alice 
or Bob's, Maurer provides a procedure, called advantage 
distillation, which allows them to obtain shared random bits 
about which Eve knows arbitrarily little. Maurer, with Wolf, 
subsequently provided an if and only if distilability condition 
for all distributions created by a combination of a satellite 
producing random bits and local noise [16], [17]. 

Note that it is assumed that all parties know the distribution 
Pabe- The knowledge they lack is only about particular 
samples from the distribution. We will also be making this 
assumption throughout the following. This is not an innocent 
postulate; though it is sensible to assume that Eve knows 
Pab e, one need not assume that Alice and Bob know anything 
about Eve's data. Advantage distillation requires that Alice and 
Bob have a bound on Eve's error rate. If the physical situation 
prevents them bounding her errors, the parties might be better 
off using quantum cryptography [8]. 

If Alice and Bob want to communicate secretly, they will 
not always have a satellite available to help them generate their 
secret key. The broad question addressed in this paper is then: 
what physical situations can be used to generate secret key? 
Or more precisely, which distributions, Pabe, can be used to 
generate secret key? 

The approach in this paper is rather different from that 
adopted in other work (though it is related to a construction in 
[9] and in [10] see Section III). In the usual scenario the distil- 
lation procedure consists of joint operations on an arbitrarily 
large number of copies of the distribution (Pabe) N , assisted 
by communication over an insecure, but authenticated channel. 
In this context, the secrecy properties of a distribution Pabe 
are typically assessed by the 'secret key rate'. This is the 
maximal rate at which Alice and Bob, receiving data according 
to Pabe, can generate a key about which Eve's information is 
arbitrarily close to zero. By contrast, we consider distillation 
in the 'single-copy' scenario, and instead of rates the protocol 
optimizes the 'quality' of the distilled secret bit, where the 
'quality' is quantified by the secret-bit fraction of the result. 
The secret bit fraction of Pab e is defined as the maximum r 
such that there exists a decomposition of Pabe of the form: 
Pabe = tSabQe + (1 - t)H A be where r 6 [0,1], Q E 
and Hab e can be any probability distributions and Sab is a 
shared bit. 

Given a distribution Pabe, the maximal 'quality' of the 
secret bits that can be distilled from it is denoted by 
A-[Pabe], and called the 'maximal extractable secret-bit frac- 
tion' (MESBF) of Pabe- 
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We define A[Pabe] as follows. Suppose Alice, Bob and 
Eve all receive one sample from the distribution Pabe- 
Consider the set of distributions P' ABE that can be obtained 
from Pabe with some probability, when Alice and Bob 
perform local operations and public communication (LOPC). 
We allow the probability of obtaining any such P' A be to be 
arbitrarily small as long as it is positive. We call this class 
of transformations stochastic-LOPC (SLOPC). We also call 
them nitrations or filtering operations. We consider SLOPC 
transformations because, as mentioned above, we do not care 
about the rates at which the distributions P' A be can be 
obtained from Pabe- Instead, we want to know which of the 
obtainable distributions P'abe most resembles a secret bit, and 
we quantify this resemblance by the secret-bit fraction. We 
denote the maximal secret-bit fraction that can be extracted 
from Pabe by A[P A be\- 

If Alice and Bob share a perfectly correlated random bit 
and Eve is uncorrelated from them, A[Pabe] will be '1'. If all 
parties only have uncorrelated data as outputs then A[Pabe] = 
1/2. Note that the nitrations can sometimes fail. This failure 
rate is not reflected in the size of A[Pabe] since we only 
consider the case where the filtration is successful. It follows 
that distributions exist with A[Pabe] equal to '1' but with 
very low secret key rates. 

One of the main results motivating our use of the MESBF is 
to show that if A[Pabe] > \ then Pabe has a positive secret 
key rate (in the asymptotic scenario). The value of A[P A be] 
can thus be an indicator of whether a distribution has distillable 
key: however it tells us nothing about the size of the secret 
key rate. A necessary and sufficient condition for distributions 
to have secret key is that there exists a positive integer N such 
that A[P ABE ] > i, where Pabe represents N samples from 
Pabe [18]. 

A very similar quantity called the 'singlet-fraction' has been 
introduced in entanglement theory in quantum mechanics, in 
the context of entanglement distillation [13]. To our surprise 
we were able to prove rather more about our classical quantity 
that has been found for the quantum case. The connection 
between entanglement theory and cryptography is not coin- 
cidental and has been investigated at length (one of the best 
introductions is [5]). In analogy to bound entanglement [12], 
the existence of bound information has been conjectured [7], 
[19]. Distributions that can yield no secret key and yet cannot 
be created by LOPC show bound information. A distribution 
will have bound information if A[P ABE ] = | for all N and yet 
the distribution cannot be generated by LOPC alone. Hence, 
the study of A may prove useful for proving the existence of 
bound information. 

Let us now highlight the results in this paper. As well as 
showing (a) that A[Pabe] > \ implies a positive secret key 
rate we present four further results, (b) We show that A[Pabe) 
is a secrecy monotone under SLOPC by Alice and Bob and 
under local operations by Eve. (c) We have a closed expression 
for A[Pabe] for a ll distributions where Eve is uncoupled, that 
is Pabe = PabPe- In this case, the optimal filtration is also 



obtained, (d) We find A[Pabe] for Pabe where Alice and 
Bob's random variables only have two possible outcomes and 
are restricted to using nitrations which can be stochastically 
reversed, (e) We show that, for general Pabe, optimal filtering 
operations can sometimes require Alice and Bob to degrade 
their data (by partially locally randomizing). This last result 
is surprising. One might expect that if Alice and Bob degrade 
their information they will have a lower secret-bit fraction; 
however this is to neglect the role of Eve who might lose, 
comparatively, even more information. We provide an example 
where local randomization improves the secret-bit fraction of 
a distribution over that obtained when the data is reversibly 
transformed. 

A brief outline of the rest of this paper is now given. Section 
II introduces the scenario considered, defines the notation, and 
presents the first results including the proof that A[Pabe] is a 
secrecy monotone. Section III supplies a sufficient condition 
for a distribution to be used to generate secret key. Section 
IV describes reversible nitrations, operations which can be 
successfully undone with a non-zero probability. The same 
section finds A[Pabe] for distributions where Alice and Bob 
can only have two outcomes and perform reversible nitrations. 
Section V finds A[Pab] when Eve is decoupled from the 
communicating parties. The last section of results, VI, shows 
that in general, nitrations that yield the MESBF require the 
cooperating parties to degrade their data. We conclude by 
discussing open problems and investigating interpretations of 
the quantity A[P ABE }. The appendices contain some of the 
longer proofs; Appendix II is of independent interest as it 
provides a useful general decomposition of nitrations. 

II. Definitions and basic results 

In the following we define the scenario considered in this 
paper. Alice and Bob are connected by an authenticated 
tamper-proof channel. The channel is, however, insecure; a 
third party, Eve, learns all communicated messages. Alice, Bob 
and Eve each obtain a letter from alphabets of sizes dA,d B} 
and d E respectively. These outputs come from a probability 
distribution Pabe- Here, and in what follows, A,B,E will 
only appear as labels identifying the parties sampling from the 
distribution (A, B, E are not random variables). The symbols 
a, b, e will be treated as random variables with alphabets of 
size dA,d B , and d E respectively. The same symbols a, b, e will 
also be used to represent particular values of the random vari- 
ables. Any particular entry of the vector of probabilities Pabe 
will thus be expressed as Pabe{<i 1 b, e). For convenience, 
probabilities are allowed to be un-normalized, that is, the only 
constraint on Pab e ifl, b, e) is that all its entries are non- 
negative. Alice and Bob are allowed to perform general local 
operations, where by general it is meant that the operation need 
not always be successful. Alice's operations can be expressed 
as a d' A x dA matrix of non-negative entries, denoted by 
T> A (a',a), where a' G {0, ...,d' A — 1}, a G {0,...,d A - 1} 
and T>a(o! , a) > 0. With probability T>a(o! , a) the output a is 
written to a'. Even when normalised, the sum of the elements 
in each column can be less than one; this expresses the fact 
that the operation can fail. Bob's operations are defined by a 
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similar matrix J E - When T>a and J E are applied to Pabe, 
the components of the resulting distribution are denoted by 
[V aJbPabe\{<i, b, e). In the event that there is no output after 
filtering, Alice and Bob communicate publicly and throw away 
their data. We now provide specific definitions of the quantities 
considered in the rest of the paper. 

Definition 1 [Secret bit fraction of a binary distribution ]. 
A distribution where d,A = <1b = 2 and d E is arbitrary, is 
called 'binary' . The secret-bit fraction of the normalized binary 
distribution Pabe will be called \{Pabe}- X[Pabe] is the 
maximum r such that there exists a decomposition of Pabe 
of the form: 



Pabe = tSabQe + (1 - t)Habe, 



(1) 



where r G [0,1]. Qe and Habe can be any probability 
distributions and Sab (a, b) = \ 8 a b is a shared bit. The result 
proved in the following lemma will be used widely in this 
paper. 

Lemma 1. Given a binary distribution Pabe (not necessar- 
ily normalized) its secret-bit fraction is the following: 

2 J2 e ™m[P AB E(0, 0, e), Pabe(1, 1, e 



ABE] 



T,abe P AB E (a,b,e) 



(2) 



Proof: Notice that \[vP] = X[P] for any v > 0. Hence we 
can assume that P is normalized and forget the denominator. 
Taking the optimal decomposition (1) and using the fact that 
the components of Habe are positive, one can write the 
following componentwise inequality Pabe > tSabQe- Here 
we have treated Pabe an d Habe as vectors and SabQe as 
the tensor product of two vectors. Let Q' E = tQe (recall 
E e Qs{e) = 1). It follows that P ABE {a, b, e) > \ 5 ab Q' E {e). 
If a 7^ b the inequality is satisfied. If a = b, then both 
P AB E(0,0,e) > \ Q' E (e) and P ABE (l, 1, e) > \ Q' E (e) 
must hold. It is clear that the maximum r is achieved with 
Q' E = 2min[P j4 B£;(0, 0, e), Pare(L L e)]. Substituting this 
value of Q' E (e) into E e Q'e^) = T completes the proof. ■ 

Now, we want to generalize the notion of secret-bit frac- 
tion for general distributions, not necessarily being binary. 
For this we proceed as follows. Given a distribution Pabe 
(not necessarily binary), we consider all SLOPC protocols 
whose result is a binary distribution. Among all these binary 
distributions obtainable from Pabe by SLOPC we want to 
find the one which maximizes the formula (2). Without loss 
of generality, any SLOPC protocol can always be decomposed 
in the following way. Alice performs the local operation T> A ^ 
and makes public some of her information. One can think 
that the outcome of V^' has two variables (a',ci), where 
a' is kept secretly by Alice, and ci is broadcasted. Later, 
Bob, depending on the message c\ performs a local operation 
Jb 1 " 1 with outcome (&', C2), and sends the message C2. Later, 
Alice, depending on the messages c\c 2 performs another local 
operation T> A lC2 \ and so on. If at the end of the protocol 
none of Alice's and Bob's operations has failed, for each 
string of messages c = (C1C2C3 • • • ), Alice has performed 
a string of operations V f V ^V^ lC2C3Ci) • • • . We denote 



the product of these matrices by T> C A , where the dependence 
on the public messages is expressed through c. Similarly, we 
define J B for Bob. If the initial distribution is Pabe, then the 
final distribution is P'abec^^ ^ e > ^) = \^ > a^b^'abe\{<i 1 b, e) 
(here a and b are binary variables). Having settled all this 
notation for protocols with communication, we are ready to 
prove that communication is not necessary at all. 

Lemma 2. In order to find the SLOPC protocol that max- 
imizes A, one need only consider protocols without public 
communication. 

Proof: Suppose that at the end of a general SLOPC protocol 
the distribution obtained is P' ABEC (a,b,e,c), which we can 
assume to be normalized. Because the random variable c is 
public, we have to consider it as part of Eve's knowledge (e, c). 
Using formula (2), the secret-bit fraction of P'abec^ '-' ^> e > ^) 
satisfies 

x [P'abec\ = 2 ^^[P'abec( q ^^^),P'abec( 1 ^ 1 ^^) 

e,c 

= Y^Pc{c)\[P'abec{-\c)] 

C 

<m^X[P AB Ec(-\c)} , (3) 

c 

where Pabec('\^) denotes the probability distribution for 
ABE conditioned on a particular string of messages c. If 
the maximum in (3) is attained for the value Co, the protocol 
without communication consisting of just the local operations 
T> A ° and J B °, is not worse than the general one. ■ 

Lemma 2 allows for a simple mathematical definition of the 
principal quantity studied in this paper. 

Definition 2 [The MESBF of a distribution]. The MESBF 
of Pabe is 

HPabe] = sup \[D a JbPabe] ■ (4) 

T>aJb 

The fact that a supremum, rather than a maximum, is 
considered in this definition, follows from the requirement 
that SLOPC transformations must succeed with probability 
strictly larger than zero. In some cases, the optimal SLOPC 
transformation does not exist. But one can apply a transforma- 
tion giving a secret-bit fraction as close as one wishes to A. 
(A very similar phenomenon appears for the 'singlet fraction' 
of quantum states [13] and is called quasi-distillability.) For 
any distribution, Pabe, we know that A[Pabe] £ [5, 1]- The 
lower bound of \ can always be obtained if Alice and Bob 
throw away any data they have and simply toss unbiased coins. 
An important fact about A is that it is a secrecy monotone. 

Theorem 1. The quantity A[Pabe] has the following prop- 
erties: 

• A[Pabe] is nonincreasing when the honest parties per- 
form local operations and public communication. Even if 
these operations can fail with some probability (SLOPC). 

• A[Pabe] is nondecreasing when Eve performs local 
operations. 
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Proof: The proof of the first statement comes from the 
definition of A, in terms of an optimization over all possible 
SLOPC protocols. The second statement can be shown by 
applying an arbitrary operation y E to Eve's data, and see how 
A changes. y E must not be a filtration, because Eve cannot 
make the honest parties reject their data. 

A [y E PAB E ] = min E y E (e', e)P ABE (0, 0, e), 

e' e 

Y,yE(e',e)P ABE (l,l,e)] 

e 

>2^3^(e',e) min [Pabe{Qi 0, e) , Pabe{\i 1? e )] 

e 1 ,e 

= 2^min [P ABE (0,0,e),P ABE (l,l,e)} . (5) 

e 

Where the inequality comes from the concavity of the min 
function. ■ 

III. A SUFFICIENT CONDITION FOR DISTILLABLE SECRECY 

In this section we provide a sufficient condition for a 
distribution P ABE to allow a strictly positive secret key rate 
between Alice and Bob. Performing collective operations on 
sufficient samples from a distribution satisfying this condition, 
and by communicating over their insecure channel, Alice and 
Bob can always obtain secret key. 

Theorem 2. If A[P ABE ] > \ then P ABE has distillable 
secret key. 

If nitrations V A and J B can be found such that 
\[D A JbPabe] > \ then P, is E has distillable key. The proof 
of this theorem is found in Appendix I. There, we describe a 
protocol with which one can always distill a secret key, if the 
condition of the theorem is satisfied. 

On completion of this paper we were made aware of 
the work of Holenstein [10], [11]. His work defines two 
parameters (e, S) associated with each probability distribution 
Pabb and provides a necessary and sufficient condition for 
the distribution to have distillable key in terms of these two 
parameters. Given a binary distribution P AEE such that 



P A (0) = P B (0) = P A (1) = P B (l) = - 



Pab(0,0) = Pab (1,1) > 



1 + e 



there exists an event £ which implies A = B such that 

P ABE (£\A = B) > 5 
I(A : E\£) = 

With these definitions we can get the lower bound 



MPabe] > Pabb(A = B,£) > 



1 + e 



(6) 



(7) 



(8) 
(9) 



(10) 



Hence, the distillability condition in terms of A follows from 
Holenstein's condition in terms of these two parameters. How- 
ever, it is insightful to have the distillability condition in terms 
of a single quantity, which is an operationally meaningful 



secrecy monotone. We should note that A[P AB e] is defined 
through an optimization over nitrations unlike Holenstein's 
two parameters. 

IV. MESBF BY REVERSIBLE OPERATIONS 

In this section we introduce a distinction between operations 
that degrade the data, and operations that do not. We say 
that an operation V degrades the data, if once it has been 
applied to the data there is no probability that the original 
data can be recovered. Then, operations that do not degrade 
the data are called reversible. Mathematically, the operation 
corresponding to the matrix T> is reversible if its inverse 
V^ 1 has nonnegative entries. Notice that the fact that the 
inverse exists, does not mean that the transformation can be 
undone with probability one; since rates of distillation are 
of no concern in the scenario considered in this paper, the 
probabilistic nature of the reversibility is irrelevant. 

Of course classical information can always be copied, and 
thus, recovered whatever transformation is applied to it. But, 
if within a particular operation data is copied, this has to be 
represented in the matrix corresponding to this operation. It is 
clear that this kind of operation is always reversible. 

Definition 3 [Reversible stochastic transformations]. A 
stochastic transformation V is reversible if its inverse T)^ 1 has 
non-negative entries. This implies that if a given distribution 
P is processed with V, we can still recover P (with some 
probability of success) by applying V^ 1 . 

As an instance, let us consider transformations on the set of 
two-outcome probability distributions. The inverses of 2 x 2 
matrices can be obtained through the following formula 

-l 

I y — i™ 

(ID 



W 

y 



-y 



w 



wz — xy 

It is easy to see that 2x2 operations are reversible if, and 
only if, they are diagonal or anti-diagonal. This fact will be 
used later. 

Definition 4 [Equivalent distributions under reversible oper- 
ations]. Two probability distributions are called 'equivalent' if 
there exists a reversible operation which takes one probability 
distribution to the other and viceversa. These equivalence 
classes have the following useful property. 

Lemma 3. Within an equivalence class all distributions have 
the same MESBF. 

Proof: Suppose that two equivalent distributions, P AB e and 
P' ABE , have different MESBF: A[P ABE ] < A[P' ABE ] without 
loss of generality. This gives a contradiction, because in the 
protocol that optimizes A[P ABE ], one can always perform a 
first step consisting of going from P ABE to P' ABE . ■ 

In the following we find the MESBF for binary distributions 
when Alice and Bob are restricted to performing reversible 
operations on their data. For a distribution P AB E we call this 
quantity A R [P ABE ]. 
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Definition 5. The MESBF with reversible operations TZa 
and Vb is 

Ar[Pabe] = sup \\1I a VbPabe\. (12) 

KaVb 

Theorem 3. Given a binary distribution Pab e the maximum 
value of A, after reversible nitrations, is as follows: 



readily check that X[TZaVbPab] — 0). We will now consider 
the function \[1ZaVbPab] for different ranges of r. 



1) For r G 



r P(0,0,g) P(0, 0,3+1) 
Lp(l,l,s)' P(l, 1,3+1) 



if), G {0,..,d B -2}, Eq. 



(15) can be written as: 

P(0, 0) + 9 P(1, 0) + r -P(0, 1) + rP(l, 1) ' 



(16) 



/max, ■. 
zmax e // e §[ 



max | 

2^ e min[P(0,0,e),^P(l, l,e)] 



P(0, 0) + 4> e ,P{l, 1) + 2^P(0,l)P(l,0) 
2£ e min[P(0,l,e),^P(l,0,e)] . 



P(0, 1) + W'P(1, 0) + 2 v /^-P(0,0)P(l,l) J 



} 5 

(13) 



where we have suppressed the indices 'A, B, £" on the right 
hand side so that P = Pabe- The formula is further 
compressed by writing P(a, b) = J2 e P( a > ^> e )- We define 



, = and W< = 



P(0,l,e") 
P(l,0,e")' 



The set O is the set of 



all e where both P(0, 0, e) ^ and P(l, 1, e) ^ 0. The set § 
is the set of all e where both P(l, 0, e) ^ and P(0, 1, e) ^ 0. 
The operation zmax e ' £ Q is constructed in the following way. 
It returns the maximum value of its argument as e' is varied 
over the set Q. If Q is empty then the operation is defined 
as returning '0'. The operation zmax e » e § is defined similarly 
with regard to the set S. 

Corollary 1. In the case where Eve is decoupled, Pabe = 
PabPe, this reduces to: 

0ifP(0,0)P(l,l) = P(0,l)P(l,0) = ' 



Ar\Pab] = < 



max 



1 + 



P(0,1)P(1,0) 
P(0,0)P(1,1) 



( 1 , / P(0,0)P(1,1) \ " 
\ + \ P(0,1)P(1,0) ) 



otherwise 
(14) 



Note that both Theorem 3 and Corollary 1 have lower 
bounds of zero. This is in contrast to A[Pabe] G [1/2,1] 
where the lower bound can always be obtained if Alice 
and Bob both perform the irreversible operation of throwing 
away all data and tossing unbiased coins. Since irreversible 
operations are excluded in the definition of Ar[Pabe] it takes 
a lower bound of zero. 

Proof of Theorem 3. Let us consider the supremum (12) with 
the constraint that T>a,Jb are of the form TZa = diag(a,/3) 
and Vb = diag(7, S) where a, (3, 7, 5 > 0. 



A«[P] 



2E e min[a 7 P(0,0, e ),^P(l,l,e)] 
"7P(0, 0) + /3 7 P(1, 0) + q5P(0, 1) + p6P(l, 1 
2E e min[P(0,0,e),rP(l,l,e)] 



: SUp 



P(0, 0) + qP(l, 0) + =P(0, 1) + rP(l, 1) 



(15) 



06 
ay ' 

P(0,0,i) ^ P(0,0,i+1) 



where q = — and r 



We now label the outputs of Eve 

so that i^tjj) < p("^iXi) for a11 i e {°' ••' dE _ ^ there 
is an i such that P(0, 0, i) = P(l, 1, i) = this should be left 
out of the ordering; if P(0, 0) = P(l, 1) = then one can 



2) When r £ [0, p[°;";°j ) the numerator of Eq. (15) 
becomes 2r P(l, 1). 
], 3) When r <G [ p(i'i'^Ii) ; 00 ) the numerator of Eq. (15) 
becomes 2 P(0,'()) B 
For each range 1. — 3. by differentiating with respect to r, 
holding q constant, one can deduce that the maxima are always 
at one of the limits of the specified range of r. More precisely, 
the global maximum of the function in Eq. (15) occurs when 
r = p|°'"^,j = <j> e i for a particular e' G {0, rig — 1}. The 
r = and r = 00 limits correspond to minima. 

Restricting the function to the points r — <j> e i one can 
differentiate with respect to q. Using this one finds that the 

maxima occur when q = cf) e > ptj^j ■ Substituting this into 
Eq. (15) one obtains the first term in the 'max' in Eq. (13). 
The 'zmax e / G Q' indicates that we vary over all e' G Q. Since 
we know that the r = and r = 00 limits correspond to 
minima, Q is constructed to exclude these situations from the 
allowed values of e'. 

We have found the optimal value of X[RaVbP] given that 
TZa and Vb are diagonal. This is not yet A#[P] since there 
are other possible reversible filtrations TZa and Vb- 

In this binary case filtering operations are 2 x 2 matrices. 
As noted above such filtrations are reversible only if they are 
diagonal or anti-diagonal matrices. Some thought shows that, 
by considering the case TZa anti-diagonal and Vb diagonal, 
we will have looked at all distinct reversible operations. 

The case where TZa — antidiag(a, (i) and Vb = diag(7, 8) 
can be treated using the tools used in the case where both 
matrices were diagonal. One obtains as a result the other term 
in the 'max' in Eq. (13). Again, the 'zmax e / 6 §' indicates that 
we vary over all e' G S. ■ 

By definition Ap[P] < A[P] holds in general. A reasonable 
question to pose is, for which distributions P is the inequality 
saturated such that A#[P] = A[P]? In such cases, locally 
degrading the data would not help. In the next section a class 
of such distributions is given. 

V. The MESBF from private correlations 

In this section we consider the MESBF when Alice and 
Bob can have alphabets of any size but they are uncorrected 
with the eavesdropper. Though its proof is nontrivial, the result 
contained in Theorem 4 is intuitive. The optimal protocol is 
to filter only two outcomes. The result shows that, except 
for unusual distributions described below, filtering operations 
which introduce local randomness serve no advantage. This is 
in contrast with the result of the next section where we find a 
role for local randomization. In addition we find that filtering 
operations which take several outcomes to just one (eg. '4' — > 
'0' and '5' '0') cannot help. 



6 



Theorem 4. For distributions Pab where Eve is decoupled, 
the MESBF is the following: 

\ if P{a ,b )P(a 1 ,b 1 ) = 
P(a ,h)P(a u bo) 

A [Pab] = max 

ao,bo,ai,bi 



same, one knows that the optimal choice of , aY' , b^ 1 ' , bY 



^7 



P(. ll ,t 1 )F(» 1 ,i.0) 
P{a lt .h u )P(a 1 ,b 1 ) 




Where, in the maximization ao, a\ E {0, 1, d,A — 1} and 
&o,&i E {0,l,...,d B - 1}. 

The proof of this Theorem is long and is contained 
in Appendix III. In the situation P(ao, bo)P(a\, b\) = 
P(a , bi)P(ai, bo) — local randomness is useful. Throwing 
away all data and using local, unbiased, coin tosses can always 
obtain a secret-bit fraction of |. 

Corollary 2. For N copies of the distribution Pab (repre- 
sented as P^b) where Eve is decoupled the MESBF is: 

i if P{a ,b )P(a 1 ,b 1 ) 

, P(a Q ,b 1 )P(a 1 ,b ) = 
A[P^] = max ^ _ 

■ otherwise 



P(a ,b 1 )P(a 1 ,b ) \ 
P(a ,b Q )P(a 1 ,b 1 ) ) 



-N]2 



(18) 



Proof: We first note that the expression for A[Pab] in 
Theorem 3 depends monotonically on the quantity lu = 
pl" ^ 1 ! pi" 1 ^ ! ■ When the expression is at a maximum, lo is 
at a minimum. It is oj that we will consider in the following. 
We say that a single copy of a distribution will have output 
alphabets of sizes cLa and ds- For N copies of Pab (the 
distribution Pab) w becomes: 



P Ar (a ,b )P w (a 1 ,b 1 )' 



(19) 



where a and b can be viewed as N component vectors 
with each entry aW and b^ chosen from alphabets of sizes 
d,A and d B respectively. Thus, by definition P^fa,,,^) = 
P( 1 )(4 1 \^ 1) _)P( 2 )(4 2) ,6f ) )...P( JV )(4 JV) ,^ JV) ). " Where 
pW = P is the original single copy distribution; the 
superindex (i) appears for counting purposes. 

Performing a similar decomposition for the other three terms 
in Eq. (19) and with some rearranging one obtains: 



P( 2 )(4 2) ,6( 2) )P( 2 )(4 2) ,4 2) ) 



P( 2 )(4 2) ,6^ 2) )P( 2 )(af ) ,6f ) ) J 
pW(4 N \b[ N) )PW(a[ N \b G \ N) ) 



PW(ai N \b^)PW(a{ N \b[ N) ) 



(20) 



The maximum value of A corresponds to the situation where 
lu is a minimum. We note that each square-bracketed term in 
Eq. (20) is labeled by the superindex (i) and depends on a 
different set of outcomes a l ^ ,a^\ b^ , . One can thus min- 
imize each square bracketed term in Eq. (20) independently. 
Since all of the probability distributions labeled (i) are the 



for term (1) will also be the optimum for all terms. Eq. (20) 

rP (1) (a (1) b (1) )P (1) (a (1) b (1) )~\ N 

thus becomes cv = [ pa^^pa^^ l • Dro PP in S 
the label (1) one obtains Corollary 2. ■ 

From Corollary 2 one sees that as N increases A[P^g] 
converges exponentially to 1 if Pab has distillable secrecy. 

VI. The MESBF for general correlations 

We have no formula for the MESBF for general distributions 
Pabe- In the following section we investigate this case and 
identify a distribution, Pabe, where irreversible operations 
obtain a higher secret-bit fraction than the value obtained by 
reversible ones alone. 

Theorem 3 shows that local randomization has virtually 
no role in the protocols that maximize the secret-bit fraction 
when Eve is decoupled. One might therefore hope that, on 
introducing Eve, local randomization remains unnecessary. At 
first glance, local randomization in one-shot protocols seems 
to serve no role in maximizing the secret-bit fraction. If Alice 
and Bob locally degrade their data one might argue that their 
secret-bit fraction would inevitably fall. This is incorrect; in 
the following we provide an example in which, if Alice and 
Bob both locally degrade their data, the value of their secret- 
bit fraction is higher than if they perform only reversible 
operations. In general, reversible operations are not optimal 
nitrations. As soon as Eve is introduced, there is thus a 
larger role for local randomness in maximizing the secret- 
bit fraction of a distribution. A motivation for this result is 
the following: though Alice and Bob do indeed become less 
correlated as a result of local randomization, Eve becomes 
even less correlated than them. Note that local randomization 
certainly does have established uses in obtaining good secret 
key rates in the multi-copy case [6]; where local randomization 
by one party can improve the rate. 

We will now provide an example where, if Alice and 
Bob randomize locally, they can improve their secret-bit 
fraction over the value obtained by optimal reversible nitra- 
tions. Before giving the example we introduce the following 
notation. Since distributions on three variables do not lend 
themselves to easy graphical representation, we let Pabe = 
J2 a be p abe (a, b, e) d abe where the orthonormal vectors d abe 
V a, b, e consist of the standard basis, (the vectors each 
represent deterministic probability distributions on the vari- 
ables, where only the outcomes a, b, e can occur). Consider 
the distribution: 



Pabe = [(6 d 00 o + 6 duo) + (5 d n + 5 di i + 2 d m )] . 

(21) 

Note that in the first round bracketed term Eve has '0' and 
in the second '1'. Applying formula (13) to this distribution, 
one obtains Ar[Pabe] = \- Actually, if Alice and Bob do 
nothing, they already have \[Pabe] = \ (by Eq. (2)). If both 
parties perform the filtration 



V a = Jb 



1 e 
1 



(22) 
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with e w 0.01, the transformed distribution, P^ BE , has 
A[P^ B£ ] > i. In this case the MESBF is not obtained by 
reversible operations. Here the randomization can be viewed 
as having the effect that it creates a secret bit between Alice 
and Bob when Eve has the outcome '1'. That more general 
irreversible nitrations are required to obtain the highest secret- 
bit fraction means that the analytical task of finding A[Pabe] 
is difficult in general. Finding A[Pabe] numerically for a 
given distribution, Pabe, is also difficult as the function to 
be optimized is not concave. 

VII. Conclusion 

In this section we review the results obtained, outline 
open questions and provide alternative interpretations of the 
MESBF. 

In this paper we have functionally defined a new measure 
A[Pabe] called the MESBF of Pabe and we showed that 
it is a secrecy monotone. We showed that if A[Pabe] > \ 
then the distribution can be used to distill secret key. We 
gave a comprehensive characterization of A[Pab] when Eve 
is decoupled and also in the case of reversible operations on 
binary distributions. Using the results for reversible operations 
we were able to show that there exist distributions for which 
the optimal filtration requires local degradation of data. An 
open problem is to show that A[Pabe] > \ is not a 
necessary condition for distillability; if it were necessary then 
the MESBF would be a very useful tool for the investigation 
of bound information [9], [19]. 

In this paper AfP^s^] has been treated as a measure to 
give us yes/no information about whether Pabe can be used 
to distill secret key. It can, however be viewed in two other 
ways: 

• There is a restricted communication scenario in which 
filtrations of Pabe which maximize the secret-bit frac- 
tion are exactly what the co-operating players would like 
to do in order to make their communication as secret as 
possible: if the parties attempt a form of (a) 'running' key 
generation given (b) unlimited streams of source data but 
(c) finite memories. 

(a) By 'running' we mean that as soon as a successful 
filtration has occurred the random bits are used for en- 
cryption purposes; they are not stored up and then subject 
to information reconciliation and privacy amplification 
[6], [3], [14]. This is, of course, a substantial constraint. 

(b) If there is plenty of source data, the fact that heavy 
filtration might be required to maximize the secret-bit 
fraction is not a problem. 

(c) Their memories must be finite since we consider 
optimal single shot operations. 

In this applied context, the role of local randomization is 
surprising; if Alice and Bob degrade their data they can 
nonetheless improve the secrecy of their communication. 

• Advantage distillation is a standard first step for obtaining 
secret key from samples from a general distribution 
Pabe- The single shot filtrations that are described here 
can be viewed as a generalization of advantage distilla- 
tion. A filtration that maximizes the secret-bit fraction of 
a distribution can be viewed as an optimal distillation 



step (in the scenario where the supply of data is not 
limiting). Note that though the approach acts on only 
one copy of a distribution this single copy can be viewed 
as many copies of a lower dimensional distribution. The 
fact that introducing local randomness can be helpful in 
maximizing the secret-bit fraction raises the intriguing 
possibility that degrading data serves a role in generalized 
advantage distillation. In the example given, both Alice 
and Bob symmetrically add noise. This is distinct from 
the case considered in [6] where only one party adds 
noise. A future area of research would be to attempt to 
identify a distribution where optimal filtrations require 
both parties to degrade their data. 
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Appendix I 
Proof of Theorem 2 

In this section we provide a proof of Theorem 2. To do 
so, we explicitly describe the distillation protocol with which 
one can distill secret key from all distributions satisfying the 
condition of the theorem. This protocol might not be efficient, 
but it is enough for our purposes. 

Protocol. The first part of the protocol is similar to advan- 
tage distillation, a procedure introduced in [15]. Alice and 
Bob take N samples from their distributions, respectively, 
(<H, d2, ajv) and 62, ■ b^). They perform the follow- 
ing stochastic transformation on their strings: 

01010101--- — > (23) 
10101010- •• — ► 1 (24) 
other — ► reject (25) 

If both succeed they each keep their final (N th ) bit, denoted 
a' and b'. They repeat this procedure many times, obtaining a 
long string of pairs (a', b'). The reason for alternating 0's and 
l's in the above sequences is because, even in the case where 
Alice and Bob's marginal is biased, the sequences (23) and 
(24) are equiprobable. 

The second step of the protocol consists of taking long 
strings of pairs (a',b') and performing information reconcil- 
iation and privacy amplification, as described by Csiszar and 
Korner in [6]. This second step yields a secret key if, and only 
if, 

H(a'\b') < H(a'\e) , (26) 

where H(x\y) is the Shannon entropy of the random variable 
x conditioned on y [6], and e represents all the information 
that Eve has at the end of the first step. 

Theorem 2. If A[Pabe] > \ then Pabe has distillable 
secret key. 

Proof: As in Section VI, we represent a distribution as 

Pabe = Y,abe P ABE{a,b,e)d abe , where d abe V a,b,e 



are orthonormal vectors from the standard basis. Consider the 
distribution 

Pabe = A* (J^ d ooo + tj dn j + (1 - m)(»7oo d 00 i 

+ V11 d ii2 + V01 d i3 + Via d W ij , (27) 

where ft G (1/2, 1], J2 ab r)ab = 1 and tj ab > 0. Note that, by 
degrading Eve's data, all distributions Pabe with the same 
secret-bit fraction ft, and the same marginal for Alice and Bob 
(characterized by \i and r) ab ) can be obtained from (27). This 
means that if the distribution (27) has distillable secret key, 
then any distribution P' with X[P'] = \i will have distillable 
secret key. 

In the distribution (27), with probability 1 - /1 Eve knows 
Alice and Bob's bits perfectly, and with probability \i she only 
knows that they are perfectly correlated. The probability that 
Alice and Bob have a different outcome is e = (1 — \x){j]oi + 
f?io) < (1 — /") < 1/2- In the following we consider the first 
step of the protocol described above. In it, the honest parties 
accept their data if they have the string (23) or (24). Let t 
be the probability that Alice obtains the string (23); this is 
the same as the probability that she obtains (24). The chance 
that Alice and Bob accept the same string is 2t(l - e) N , and 
the chance that they accept opposite strings is 2te N . Notice 
that these are the only two possibilities that pass the filter, 
hence, the probability that both parties accept is 2t(e N + (1 — 
e) N ). The probability that Alice and Bob have different strings 
conditioned on the fact that they accept is e N / (e N + (l — e) N ). 
In other words, Bob's uncertainty about Alice's data is 

e N /l-e\ 

* wr<r^ N ^\-r) ' (28) 

where h(r) is the Shannon entropy of the distribution (r, 1— r), 
and the approximation holds when N is large. Eve's proba- 
bility of knowing nothing, conditioned on the fact that Alice 
and Bob have publicly accepted a round of the procedure, is 
fi N I (e N + (1 - e) N ). Hence, her uncertainty about Alice's data 
is 

The condition for the functioning of the second step of 
the distillation protocol is that Bob's uncertainty H(a'\b') is 
strictly smaller than Eve's uncertainty H(a'\e). Due to the fact 
that e < 1 — fi < fj, there exists a sufficiently large N for which 
H(a'\b') < H(a'\e) holds. ■ 

Appendix II 
Decomposition of general operations 

In this section we see how a general operation can be 
decomposed into a product of more elementary operations. 
This decomposition will be used in the proof of Theorem 
4. We will use the notation from Section VI. A matrix M. 
can be written as J2ij -Mij did] where did] is an outer 
product between the orthonormal vectors di and dj from the 
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standard basis. Note that here the vectors correspond to 
a deterministic distribution for just one party (say Alice) and 
thus only one subindex is used. 

The most general filtering operation with input c G 
{1, d}, and a bit as output, is 
d-i 

v = Y,(p 0c d + v lc a 1 )al oo) 



with coefficients T>q c ,T)\ c > 0, and £> 0c + 'Die < 1 for all 
c e {l,...,d}. For each input c, we specify the bias of its 
corresponding output with the following function: 

T> 0c > V lc 

?0c < Die 

For each input c, we quantify how mixed its corresponding 
output is with the following quantity: 

if v 0c = v lc = o 

otherwise 



f if £>oc > Vic \ 
\ 1 if D 0c < V lc J 



(31) 



(32) 



V c+T> la 

The larger ^ c is, the more mixed the output (when we input 
c). Now, we relabel the input in the following way. First, we 
order the values of c € {1, d} with decreasing mixing, that 
is, [i c > (U c+ i for c = . . . d — 1. Second, we shift the value 
of the input by adding 2: c — > c + 2. Let us denote a generic 
mixing matrix by: 

M( M ) = (l- M )(dodS+d 1 d t 1 )+Mdod t 1 +d 1 4), 

with /ze [0,1/2]. (33) 
It is clear that we can write Eq. (30) as 



V 



d+l 

E 

c=2 



(£>0c + £>i c ) M(fi c ) d Wc d* 



(34) 



where the argument of M(/j, c ) is the mixing of in- 
put c, Eq. (32). Consider a (d + 2) -dimensional linear 
space with basis vectors { d , di, . . . d^, d d+ i}. The vectors 
{ d2, . . . dd, dd+i} correspond to the input, and, the vectors 
{d , di} correspond to the output. The matrix (34) can be 
viewed as a square matrix in this (d + 2) -dimensional space, 
with all the non-zero elements contained in a 2xd sub-matrix. 
In this larger space we define the square matrices 



C = 



d+l 

E 

c'=2 



(Poc+Vw) d c , d\, 



(35) 
(36) 



g c = I+d Wc d\ 

w c = (i-u c )(d dl+ didj) 

+is c ( d Q d\ + didj) + I {2 ,...,d+i} (37) 
for c = 2, ...,d+ 1. The numbers v c lie within the range 
[0,1/2]. If a matrix has the subindex {a,C2, ■ . .}, it is un- 
derstood that it only has support on the subspace spanned by 
{ d Cl , d C2 , . . .}. For example, X is the identity matrix on the 
whole space, whilst I{o,i} = do dj + di d\. One can readily 
check the following identity: 

^{0,1} Wd+i Qd+i ■ ■ ■ W 2 02l{2,...,d+i} (38) 

= m+i d Ud+1 d d+1 + [w d+1 w d ] d^ d d d 

+ --- + [W d +iW d ---W 2 ]d U2 d\ 



We have not yet specified the parameters v c . If we set v d +i = 
Hd+i, then 

W d+1 d Wd+1 dt +1 = M (/ij+i) d„ d+1 d d+1 . 

By construction, we know that /j, d > fi d +i- Hence, because the 
matrices M(/i) commute, we can assign to u d the value such 
that W d+1 W d d LUd d d = M(fi d ) d Wd d^. In the same fashion, 
we can obtain the values for all the parameters {v 2 , ■ ■ ■ v d +\\ 
such that [W d +iW d ■■■W c ] d^dt = M (/i c ) d Wc dt, for c = 
2, ...,d+ 1. Finally, we can write the full decomposition of 
Eq. (34): 



V = X { o,i } W d+1 Qd+i ■■■w 2 g 2 c 



(39) 



In the next section it will prove useful to have a decomposi- 
tion of M(ji). It is clearer to use conventional matrix notation 
here. 



1 


- 1" 




i 1 






1 


- A* 


1 














1 


- /" 



L P x ~ p J 

1 

M 

this can be further decomposed by noting that: 
1 



1 



(40) 





1 " 











T*- 1 



We will also use the fact that: 



1 
1 

1 



" 1 


M 




' 


1 " 




1 


" 




' 


1 " 





1 




1 







L 


1 




1 






(41) 



(42) 



The operations W c can thus be expanded as: 

1 



W c = 



1 - v c 


x 





1 - v c 

1 

1 



L l-v c 





\ l-v c > 





1 

' 1 
1 



1 



1 

1 



{0,1} 

+ 2{2,...,d+i} (43) 



Since this decomposition of W c will be used repeatedly in 
the following proof we will need to express it more compactly 
as: 

(44) 



where 

%. 



1 - Vc 



1 

1 - 

1 ' 

1 

1 





1 - v c 


\1-V C > 



{0,1} 



{0,1} 
+ 1{2,...,d+l} 



'-{2,..., d+l} 



H{2 j+i} (45) 

+ l{2,...,d+i] (46) 
(47) 
(48) 
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Appendix III 
Proof Of Theorem 4 

In this section we prove Theorem 4. The decomposition 
provided in the previous section will be used extensively. We 
first define more useful quantities, then derive some useful 
consequences and finally provide the proof. 



A. Definitions 

In the previous section we showed that filtrations V, repre- 
sented by 2 x d matrices, can be expressed as (d + 2) x (d + 2) 
matrices. These were then decomposed into products of square 
matrices as in Eq. (39). Analogously, we will express Pab m 
this larger space. We construct the (d + 2) x (d + 2) matrix 
Pab from Pab as follows: 



Pab {a, b) = 



0, if either a or b G {0, 1} 
Pab (a — 2, b — 2), otherwise 



(49) 



for a G {0, d A + 1} and b G {0, d B + 1}. 

We now define a function on general probability distri- 
butions, Pab, which have a G {0, ...,dA + 1} and b G 
{0,...,d B + 1}. These general distributions need not satisfy 
the promise in Eq. (49) that Pab (a, b) = if either a or b G 
{0,1}. 

Definition 6. [The function i3[Pab}]- Consider a probability 
distribution with entries P A b(ci, b), where a G {0, 1, dA + 
1} and b G {0, l,...,ds + 1}- Let us define the following 
quantity: 



AB 



max 

ao,bo,ai,bi 



\ if P(a ,b )P{a 1 ,b 1 ) 
P{a ,b 1 )P{a 1 ,b )=0 



l 



1 + V P(a ,!>0)J'(il.>>l) 



otherwise 



(50) 

where, in the maximization ao,ai G {0, 1,...,c?a + 1} and 
&o,&i G {0,l,...,d B + l}. 

We remark that if the distribution Pab were not normalized, 
its value of i? will be unchanged, i? is thus well defined on 
un-normalized or filtered distributions. 

We will also define a modified form of V: 



v = v + i {2i ... id+1} . 



(51) 



Given T>a and Jb we can find T>a and J7b as above. As noted 
above we can also form Pab for a G {0, dA + 1} and b G 
{0, (is + 1} from the distribution P^b using Eq. (49). We 
now note that: 

1) {V A J B PAB){a,b) = (V A J B P A B)(a,b) for a, b G 
{0,1} 

2) (V A J B P A B)(a,b) = P AB {a,b) = P AB (a - 2,6 - 2) 
for a _G {2,...,d A + 1} and 6 G {2, ...,d B + 1}. 

3) {T> A J B P AB ){a,b) = otherwise. 

Here, an expression of the form (VaJbPab)(o., b), identifies 
the entry (a, 6) of the un-normalized matrix yielded by the 
filtrations VaJb on Pab- 



B. Preparatory remarks and lemmas 

In this subsection we will prove a few basic results using 
the objects defined in the previous subsection. These will then 
be applied in the next subsection to prove Theorem 4. 

We will now show that: 



A r [DaJbPab} = #[V A J B P AB ], 



(52) 



where f> A is formed from T> A as in Eq. (51) and J B similarly. 
The distribution Pab is formed from Pab as in Eq. (49). 
Eq. (52) follows from the fact that V A Jb Pab contains the 
entries of T> A J B Pab (as noted in point 1. of the preceding 
subsection) and the fact that Eq. (50) is the same function 
as Eq. (14) if the optimal values of ao,a\,bo,b\ are and 1 
(Eq. (14) returns the value of Ar[Pab] if Pab is a binary 
distribution). 

The following three lemmas will be used in the proof of 
Theorem 4. 

Lemma 4. When either permutation matrices or diago- 
nal matrices with entries in the range (0, 1] operate on 
Pab, "R-aVbPab, they leave $[Pab] unaltered. Here a G 
{0, 1, d A + 1} and b G {0, 1, d B + 1} and P A b is a 
general distribution on these outcomes. 

Proof: This can be checked by looking at the structure of the 
function d noting that: (a) since the maximization condition in 
1? varies over all a ,b ,a i,6i permutatio ns on P AB have no 
effect (b) the quantity ^[2 ^(1^ is unaltered by the 
operations defined by diagonal matrices. ■ 

We will introduce the following definition which will be 
used in Lemma 5. 



T(r) = 

+Z{2,...,d+1} 



(d dJ+did t 1 ))+rdidJ+X {2i ... id+1} , 

(53) 



for r > 0. Though we call this a 'filtration', note that 
%o + Tio > 1. This relaxed definition of a filtration will not 
prove problematic (one can always normalize such filtrations 
if necessary). Note that from Eq. (48) T c =T(j^). 

Lemma 5. Filtering operations TaIb on Pab cannot in- 
crease $[Pab]- 

Proof: We first note, as in the Proof to Corollary 2, that 



P(a ,b 1 )P(aiM) 
P{a ,bo)P{a u bi) 

1} and b ,bi G {0, 1, ds + 1} and it picks 



&[P A b] is a variation over uj = p^MPta^") for a11 a °' fll e 



{0,1, -,d A 

out the minimum u>. When is at a maximum, u> is at 

a minimum. It is u> that we will consider in the following. 

For a given distribution, Pab, w takes a minimum for a 
particular set of values (a = a , Oi = a , b = 6°, b\ — 6°). 
Two cases can occur with regards to (a°, a , 6°, 6°): 

1) a = and, or a = 

2) a ^ and a° ^ 

Suppose, in Case 1., a = 0. After the filtering TaTb, w 
becomes: 



w(r) 



(P(0,6°)+rP(l,&°))fK,&g) 
(P(0,6 o )+rP(l,6 o ))P(a o ,6°) 



(54) 



Since we know that the particular set of values (a = 
0, a , 6°, 6°) are such as to minimize ui, we know that w(r = 



11 



0) < bj(r — oo). It follows, noting how ut(r) depends on r, 
that u(r = 0) < oj(t). In this case TaI B on Pab does not 
decrease uj. 

Though applying TaPb can only raise the u) corresponding 
to the outputs (a°, a°, 6°, 6°), it might be the case that this 
operation might lower the uj value of other output sets. In fact, 
the argument provided above is generic. It can be used to show 
that TaIb nitrations cannot yield an uj value lower than the 
minimum before the filtration. 

It follows that d[P AB ] > tf[T A l B P AB }- 

Similar arguments can be used when a = or indeed 
a° =a° = 0. 

Case 2 is simpler. The transformation T A T leaves 
(do, ai, bo, b\), and the corresponding uj, unaltered (recall 
that to is still valid for unnormalized distributions). In this 
case ^[Pab] = ^[Xa^bPab]- Though other entries of the 
distribution P AB will be changed by the filtration, arguments 
with the same flavor as those used for Case 1 . show that these 
changes leave d[P AB ] unaltered. ■ 

It follows by symmetry that identical statements hold for 
nitrations of the form I A T B . 

We will now make a definition which will be used in the 
following Lemma. 



Q' = J + rdodt. 



(55) 



Note that Q' is very close to Q c as defined in Eq. (36). 

Lemma 6. Filtering operations of the form Q' A I B on Pab 
cannot increase i3[Pab}- 

Proof: This proof is very similar to the proof for the 
preceding Lemma. We consider the quantity uj again. There 
will be an optimal set of outputs (a°, a°, 6°, 6°) for which 
uj takes a minimum. This time the two cases that need to be 
considered are: 

1) a = c and, or a = c 

2) a ^ c and a ^ c 

In Case 1. if a = a After the filtering G'aPb, oj becomes: 



uj{r) = 



(P( C ,fe°)+rP(0,b°))PK,&°) 
(PM o )+rP(0,6 o ))P( a °,6 o ) 



(56) 



Now, as in Lemma 4, one uses the fact that uj(r = 0) < 
uj(r = oo) to show that uj(r = 0) < ui{r). The rest of this 
proof follows along the same lines as the proof for Lemma 5. 



C. Proof of Theorem 4 

In this section we will prove that A[Pab] — $[Pab]- 
It is straightforward to see that, for all T>a and J B , 
^[DaJbPab] < Ar\DaJbPab\- From the last section we 
note that Kb\DaJbPab\ = ^[Pa Jb Pab]- In this section 
we prove that •& [V A J B Pab] < $[Pab] = $[Pab}- It follows 
that X[V a JbPab] < ^[Pab] for all V A and J B , which im- 
plies that A[Pab] < $[Pab]- On the other hand, the function 
i)[Pab] is the secret bit fraction obtained with a particular 
(reversible) processing of Pab, therefore i?[P^s] < A [Pab]. 
The previous two inequalities imply A[Pab] = #[Pab], which 
is the statement of Theorem 4. 



The approach uses the decomposition found in Section 
II combined with the preceding lemmas to show that all 
nitrations will either lower i?[Pis] or leave it the same. 
Filtrations VaJb will be expressed as products of operations 

Q^1 B Offls Q^T B ...Q A M) 1 B 1 A V B . 
We then show that 

m ( A^B Q ( a^b Q ( a ) 1b...Q { a ) 1b1aJbPab] 

< m ( A?B Q { 2 ] Ib-Q a M) IbIaJbPab] 

< ^[Q ( X ) Ib...Q { a )i bIaJbPab] < ... < &PaJbPab]. 

Similar arguments can then be used to show "&\PaJbPab\ < 
^[IaXbPab] 

Proof: The following shows that $[V A J B Pab] <J&[Pab]- 
Consider the filtration operations V A , Jb- Each V can be 
decomposed according to Eq. (39). We note, using Eq. (39) 
to expand T>a, that: 

P>aJbPaB = yVd A + lA^B Gd A + l A lB ~Wd AA ls 

■ ■ ■ G2aIb CaIb PaJbPab (57) 

Each of the W c can be decomposed further using Eq. (44). 
Eqs. (58-61) are successive re-writings of Eq. (57) which will 
prove useful. 

VaJbPab =m A+ i A lBGd A +i A lBP' AB (58) 
= W dA+lA I B Pl B 



= !C dll A lB ' I ' d + 1 A I B^d+lA lB 

c/C< 3 ) A I B T d+lA I B K,^ aIbPab (59) 



(2) 



- ^d+lA lB ' I ' d + 1 A lBP AB 

— ^d+lA B AB 



(60) 
(61) 



where P AB = V^d AA 1 B ■■■ G2a^b C a Ib IaJbPab, 
P'k B = Gd A+ i A I B P' AB , P'Hb = lCd 2 ii A lBlC^ A I B 
T d+lA I B K,^ aIbP'Xb and finally P A % = T d+lA I B P^' B . 



The operation JC^j_ lA is reversible. It follows, using Lemma 



4 and Eq. (61), that $[V a JbPab] = #[4+1 a IbP ab\ 



•mi i 
ABV 



We know that = $[T d+lA I B P' A %] (since P A % = 

T d+ i A I B P AB ). Now, by using Lemma 5, it follows that 
d[T d+lA I B P'X B ] < 0[P% B ]. It follows that d[t> A JbPab] = 

= m+i A iBPi' B } < #ipa" b }. 

(2) 

Using Lemmas 4 and 5, and noting that fc d +i A 
and IC^a are reversible, we obtain ^[DaJbPab] = 

md A+ iAiBPl B ]<nPAB]<npiB}- 

From Lemma 6 and the similarity of Gc to G' we find 
that ${P'i B ] = ^[G^+iaIbP'ab] < #[P'ab}- » follows that 
$[VaJbPab] < V[P'ab\- 

If we look at the form of P AB we find that the same decom- 
position can be performed on the operations W dAA I B G dAA lB- 
It is straightforward to use the above arguments to show that 

#[Pab] = md AA lBGd A AlBP V AB \ < #[P V A B }- 



By repeated use of the above arguments and a study of Eq. 
(57) one finds that #[T> A j B P AB ] < $[Ca1b IaJbPab}- 
Since £ A is reversible, by Lemma 4, ^[CaIbIa JbPab] = 
$[1aVbPab}- 

Exactly the same arguments can be used to show that 
$[1aJbPab] < ^[Pab]- It follows that $[V a JbPab] < 
#[Pab]. ■ 

Noting the definition of the function $ and P AB Eq. (17) 
follows. 



